Dynamic Join Product Skew Handling for Hash-Joins in Shared-Nothing Database Systems

نویسندگان

Lilian Harada

Masaru Kitsuregawa

چکیده

When data are uniformly distributed, parallel hash-based join algorithm scales up well. However, the presence of data skew can cause load imbalance among the processors, significantly deteriorating its performance. In this paper we propose a dynamic skew handling algorithm which deals with this load imbalance, by detecting and handling join product skews at run-time. The idea is to monitor the join processing at the join phase and compare the average processing rate of each partition with the rate statically predicted at the scheduling phase. If their difference is detected to be large enough to produce a significant performance degradation, the processor is considered to be overloaded and a workload compensation strategy is dynamically invoked. In this case, based on the measured average processing rate, the amount of overload caused by the unpredicted join product skew is calculated and, the amount of load to be migrated to the non-overloaded processors is determined. We propose two methods the result redistribution and the processing task migration to handle the load migration from the overloaded processor to the non-overloaded processors. Simulation results show that our dynamic skew handling approach can detect and handle load imbalances efficiently, so that the rebalance of load among the processors results in an almost constant join execution time under different join product skews.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning

Shared nothing multiprocessor archit.ecture is known t.o be more scalable to support very large databases. Compared to other join strategies, a hash-ba9ed join algorithm is particularly efficient and easily parallelized for this computation model. However, this hardware structure is very sensitive to the data skew problem. Unless the parallel hash join algorithm includes some load balancing mec...

متن کامل

Practical Skew Handling in Parallel Joins

We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each specialized for a di erent degree of skew, and to use a small sample of the relations being join...

متن کامل

Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework

he Map/Reduce framework-a parallel processing paradigm-is widely being used for large scale distributed data processing. Map/Reduce can perform typical relational database operations like selection, aggregation, and projection etc. However, binary relational operators like join, cartesian product, and set operations are difficult to implement with Map/Reduce. Map/Reduce can process homogeneous ...

متن کامل

Efficient Outer Join Data Skew Handling in Parallel DBMS

Large enterprises have been relying on parallel database management systems (PDBMS) to process their ever-increasing data volume and complex queries. The scalability and performance of a PDBMS comes from load balancing on all nodes in the system. Skewed processing will significantly slow down query response time and degrade the overall system performance. Business intelligence tools used by ent...

متن کامل

Tradeoffs in Processing Complex Join Queries via Hashing Multiprocessor Database Machines

In this paper we examine the problem of processing multi-way join queries (on the order of 10 joins) through hash-based join methods in a shared-nothing database machine. We first discuss how the choice of a format for a complex query can significantly affect performance in a multiprocessor database machine. Several query processing algorithms are then proposed and experimental results obtained...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1995

Dynamic Join Product Skew Handling for Hash-Joins in Shared-Nothing Database Systems

نویسندگان

چکیده

منابع مشابه

Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning

Practical Skew Handling in Parallel Joins

Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework

Efficient Outer Join Data Skew Handling in Parallel DBMS

Tradeoffs in Processing Complex Join Queries via Hashing Multiprocessor Database Machines

عنوان ژورنال:

اشتراک گذاری